splitting probabilities as a test of reaction coordinate choice in single-molecule experiments 
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To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, 
the experimental observable is often chosen as a putative reaction coordinate along which kinetic 
behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability 
as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting 
probability with that computed from the kinetic model provides a simple test to reject poor reaction 
coordinates. We demonstrate this test for a force spectroscopy measurement of a DNA hairpin. 



A variety of new experimental techniques have made 
it possible to monitor the conformational fluctuations 
of single biological macromolecules under both equi- 
librium and nonequilibrium conditions. These experi- 
ments aim to probe the statistical dynamics and confor- 
mational substates relevant to folding and function. In a 
typical experiment, such as observation of the resonant 
energy transfer efficiency betw^een two fluorophores in- 
corporated into an RNA molecule [1], fluctuations of a 
spectroscopic observable in the absence of an external 
field are monitored. Other experiments allows the effect 
of an external biasing potential on the dynamics to be 
observed, as in an optical trap [2-4]. 

To describe the observed dynamics of the system, it is 
tempting to identify the observable with a reaction co- 
ordinate and construct a model in which the dynamics 
evolves by a diffusion process in an effective potential, 
such as by overdamped Langevin (also called "Brown- 
ian'O dynamics [5], 



x{t) = 



^^2D{x)R(t). 



(1) 



Here, x{t) is the time-dependent motion along the 
resolved coordinate, D{x) is the diffusion constant 
(often assumed to be a constant independent of x), 
(3 = {kBT)~^ is the inverse temperature, F{x) = 
—kBT\n7T{x) is the potential of mean force (PMF) de- 
fined in terms of the observed equilibrium probability 
density 7r(x), and R{t) is a Gaussian process with zero 
mean satisfying {R{t)R{f)) = S{t - t'). 

Many physical systems such as biomolecules exhibit 
strong metastabilities in the conformational degrees of 
freedom, resulting in the presence of two or more dis- 
crete conformational states in which the system re- 
mains for a for long time before transitioning to another 
metastable state [6]. While it is often easy to find an ob- 
servable X that is a suitable order ipammeter that allows 
these metastable states to be discriminated to some de- 
gree, it is generally difficult to find a good reaction coordi- 
nate so that dynamics along the resolved coordinate are 
well-described by Eq. 1. 

For data collected in a given single-molecule experi- 
ment, how can we determine whether the resolved co- 



ordinate provides a good reaction coordinate? Recent 
work on tests of reaction coordinate suitability in com- 
puter simulations has focused on the calculation of the 
committor or splitting probabilities, a concept dating back 
to Onsager [7]. This quantity, now extensively used in 
simulation studies of protein folding [8], represents the 
probability that a trajectory first encounters one absorb- 
ing boundary placed along the reaction coordinate be- 
fore another, given an initial microscopic state of the 
system. For suitable choices of reaction coordinate, the 
distribution of committor probabilities along an equi- 
librium ensemble of configurations restricted to a given 
value of the reaction coordinate will be closely grouped 
about a characteristic value [8-13]; indeed, ideal reac- 
tion coordinates organize committor isosurfaces in an 
ordered fashion along the reaction coordinate [14]. 

Unfortunately, tests based on evaluating distributions 
of committor values along cuts of the putative reaction 
coordinate are impossible to apply in a physical exper- 
iment, since there is no way to prepare the system in 
precisely the same microscopic configuration to probe 
the statistics of committor probabilities. Instead, we 
propose a simple alternative that is readily computable 
from observed equilibrium trajectories of the resolved 
coordinate: Comparison of the average committor along 
each value of the reaction coordinate evaluated by Eq. 1 
with the empirical average committor from the observed 
trajectory. 

Theory. Consider the placement of absorbing boundaries 
at a and b near the periphery of the observed range of 
the resolved coordinate x. For a diffusion process in 
one dimension governed by Eq. 1, the probability of first 
encountering a before b starting from x G [a, 6] can be 
shown to be [5, 14, 15], 



Pa{x) 



jldx' D{x')-^e^^^'^'^ 
jldx' D{x')-^e 



(2) 



The PMF along the resolved coordinate x, F{x), can be 
estimated from a single-molecule trajectory of sufficient 
length [4, 16] or from multiple trajectories under differ- 
ent equilibrium [17] or nonequilibrium [18, 19] condi- 
tions. The diffusion profile D{x) can be estimated in a 
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number of ways (such as the Bayesian scheme of Best 
and Hummer that allows simultaneous computation of 
both PMF and diffusion constant [15]), though it is com- 
monly assumed to be constant, in which case it cancels 
from both numerator and denominator. 

An empirical estimate of the splitting probability pA {x) 
can also be computed directly from an observed equi- 
librium trajectory x{t), t e [0, 7~] (also recently noted in 
Ref. [20]), 



.^^^^ ^ CdtS{y-x{t))cA{t) 
dtS{y - x{t)) 



(3) 



where we have defined the hitting function CA{t) in 
terms of x{t) as. 



CA{t) = 



1 if TAit) <rB{t) 
otherwise 



(4) 



where auxiliary functions r^(t) and rB{t) are defined as, 

TAit) = mf{t' > t : x{t) < a} 

rB{t) = mf{t' >t:x{t)>b}. (5) 

The hitting function ca (t) simply keeps track of whether 
x{t) will hit boundary a before b immediately following 
time t, and assumes the value of unity if so, and zero 
otherwise. In practice, the delta function S{y — x{t)) is 
replaced by some kernel function of finite width, such 
as a histogram bin. An estimate of pa{x) from multiple 
equilibrium trajectories can be produced by averaging 
the trajectories weighted by their lengths. 

Our proposed test is simple: By comparing the split- 
ting probability estimated from the PMF, pA{x)r with the 
empirical estimate of the splitting probability from the 
trajectory, pa{x), we can judge whether these quantities 
are obviously discrepant over the range x e [a, 6], which 
would indicate that x is a poor reaction coordinate. Note 
that this test is necessary, but not sufficient, for x to be a 
good reaction coordinate; agreement does not mean that 
the putative reaction coordinate is a true reaction coordi- 
nate. Nevertheless, the test may be sufficiently exacting 
to reject poor choices of reaction coordinate that are not 
immediately obvious by eye yet fail this comparison. 

When the observed coordinate is determined to be a 
poor reaction coordinate, the consequences of assum- 
ing it to be an adequate reaction coordinate depend 
on the precise nature of the information extracted from 
the single-molecule data. The consequences could be 
as simple as underestimating the rate constant for a 
two-state process or as subtle as inferring an erroneous 
mechanism for more complex processes. The most obvi- 
ous consequence is mistaking the location of the tran- 
sition state — the point where the splitting probability 
Pa = 0.5 — to be displaced from the free energy barrier 
in the potential of mean force. For systems like DNA 
hairpins and proteins, this can have consequences for 




FIG. 1. Two-dimensional model system and potentials of 
mean force. Upper left: Potential for the two-dimensional 
model system, with contours drawn every 5 ksT. x and y are 
poor reaction coordinates, while q = (x — y)/ \/2 (thick black 
line) is a good reaction coordinate. Other panels: Potentials of 
mean force in units of /c^T for projections onto x, y, and q. 



the interpretation of how "brittle'' or "compliant'' the 
conformational states are perceived to be. Notably, sim- 
ilar tests have been found to be useful in validating pu- 
tative reaction coordinate choices in computer simula- 
tions, despite the ability to inspect the atomic coordi- 
nates directly [21,22]. 

Model system. As an illustrative example, we consider 
the two-dimensional model system previously studied 
by Rhee and Pande [14], 

U{x, ^) = [1 - 0.5 tanh(y - x)]{x ^ y - (6) 
+ 0.2[((^-x)2-9)2+3(^-x)] 
+ l^e-^--^-^f-iy-^-^f _ 20e-("-^)'-(^-^)', 

pictured here in the upper-left panel of Fig. 1. Two stable 
states are present, located roughly at (x, ?/) -coordinates 
(4,1) and (1,4). At ksT = 5, the PMFs along both 
X and y clearly show two distinct wells separated by 
a barrier, and yet these coordinates are expected to be 
poor reaction coordinates individually; the coordinate 
q = {x — y)/ \f2 (Fig. 1, upper-left panel, red line), how- 
ever, which connects the two stable basins more directly, 
is known to be a good reaction coordinate at this temper- 
ature [14]. 

A Brownian dynamics trajectory of 10^ steps was gen- 
erated using the discretization of Eq. 1 by Ermak and 
Yeh [23, 24], with a diffusion constant of I) = 1 and 
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FIG. 2. Splitting probability tests for two-dimensional 
model system. For each choice of projected coordinate shown 
in Fig. 1, both the trajectory-derived empirical splitting proba- 
bility PA (solid black line) and the PMF-derived splitting prob- 
ability PA (dashed black line) are shown. Dark shading repre- 
sents a 68% confidence interval about pa, and light shading a 
95% confidence interval. 



timestep At = 0.1. This trajectory was projected onto ei- 
ther poor choices of reaction coordinate x and y, or good 
reaction coordinate q (Supplementary Fig. 1). For each 
projection, the potential of mean force was estimated 
from an empirical histogram, e.g. F{x) ^ —kBT\np{x) 
for the projection onto x, using 100 equally-sized bins. 
The PMF-derived splitting probability pa{x) was com- 
puted from F{x) using Eq. 2, and the empirical splitting 
probability pa{x) according to Eq. 3. To judge whether 
disagreement between these estimates was statistically 
meaningful, the statistical uncertainty in the empirical 
Pa{x) was estimated using by time-correlation analysis 
(see Supplementary Information). 

The results of this comparison assuming a uniform 
diffusion constant are shown in Fig. 2. The poor suitabil- 
ity of X and y as reaction coordinates is easily seen by the 
large discrepancy between the the splitting probability 
Pa computed from the PMF (dashed line) and the empir- 
ical splitting probability pa estimated from the trajecto- 
ries (solid line). However, the coordinate q = (x—y) / V^, 
previously identified by Rhee and Pande as being well- 
aligned with the true reaction coordinate at this tem- 
perature by sophisticated means not available to single- 
molecule experiments [14], agrees to within statistical 
error (shaded region). 

DNA hairpin force spectroscopy. To demonstrate the utility 
of our proposed splitting probability test in a real labo- 
ratory measurement, we performed the same analysis 
on a single-molecule trajectory of a DNA hairpin in a 
double optical trap, previously reported by Woodside et 
al. [4]. The hairpin, referred to as 30R50/T4 due to the 
content of a 30 bp stem-forming sequence, is attached 
by means of dsDNA handles to two polystyrene beads 
held in a passive all-optical constant-force clamp [2] at 
an external force that encourages hopping among closed 
and open conformations over the course of the exper- 
iment. Bead displacements in the trap were recorded 
with a sampling frequency of 25 kHz [4], and the bead- 
to-bead extension trajectory was analyzed here. 

Fig. 3 shows the observed trajectory of the molecular 



extension coordinate and corresponding splitting prob- 
ability analysis for a uniform diffusion constant. From 
this analysis, it is evident there is poor agreement be- 
tween pa{x) estimated from the PMF and the empirical 
Pa{x) estimated from the trajectory in the region of ex- 
tensions between 535 and 545 nm. This suggests that, at 
this external force, dynamics would be poorly-described 
by Brownian dynamics along the total molecular exten- 
sion coordinate using Eq. 1 and a uniform diffusion con- 
stant. 

Non-uniform diffusion. Recently, it has been suggested 
that non-uniformity of the diffusion constant along the 
resolved coordinate may have important ramifications 
for single-molecule biophysical experiments [15]. Could 
strong position-dependence of the diffusion constant 
D{x) may be responsible for the observed discrepancy 
in in Fig. 3? To judge whether non-uniform diffusion 
significantly impacted our test of reaction coordinate 
suitability, we used the Bayesian inference scheme pro- 
posed by Best and Hummer [15] to simultaneously com- 
pute position-dependent diffusion constant D{x) and 
potential of mean force F{x) for the systems considered 
here (Supplementary Figs. 2 and 3). Notably, the dif- 
fusion constant varies markedly with the bead-to-bead 
extension (Fig. 4, left), and the agreement of the PMF- 
derived Pa and empirical pa (Fig. 4, right) improves 
substantially. By contrast, repeating the reaction co- 
ordinate test for the 2D model system allowing for a 
position-dependent diffusion constant reveals only rel- 
atively minor variations in the estimated diffusion con- 
stant that result in no substantial change in which reac- 
tion coordinates are rejected by the test (Supplementary 
Fig. 2). Taken together, these data suggest a significant 
role for position-dependent diffusion in the DNA hair- 
pin system under force, in agreement with the theoreti- 
cal findings of Best and Hummer [15]. 
Discussion. We note that the reaction coordinate test pre- 
sented here only allows us to test a condition that is nec- 
essary, but not sufficient, for Brownian dynamics to ap- 
propriately describe the observed dynamics on a one- 
dimensional landscape determined by the PMF. This 
does not rule out the possibility of pathological cases 
where poor reaction coordinates go unnoticed because 
the average splitting probability at a particular value 
of the resolved coordinate matches the PMF-derived 
model, but the splitting probability distribution is not 
tightly peaked about its average value. Additionally, if 
multiple reactive channels exist that are otherwise indis- 
tinguishable by this test, differences between the chan- 
nels will not be resolvable. 

Despite this, our test was able to discern good from 
poor choices of reaction coordinate in a model system, 
and reject the extension coordinate as a good choice of 
coordinate for a DNA hairpin unless a strongly position- 
dependent diffusion constant is permitted. Even then, 
there are statistically significant discrepancies between 
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FIG. 3. Splitting probability analysis for a DNA hairpin in a passive all-optical constant force double trap. From left to 
right: Histogram of observed values of the extension coordinate; complete observed trajectory of extension coordinate over 
experimental timecourse; potential of mean force along extension coordinate estimated from histogram; splitting probabilities 
estimated directly from trajectory (solid line) and computed from the potential of mean force (dashed line) using Eq. 2. Dark 
shaded regions around solid lines represent a 68% symmetric confidence interval, and light shaded regions 95% confidence 
interval. Note that the bead-to-bead DNA hairpin extension coordinate (along the ordinate) is the same throughout all panels. 
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FIG. 4. Position-dependent diffusion constant and splitting 
probability test incorporating position-dependent diffusion 
for DNA hairpin. Left: Position-dependent diffusion con- 
stant, in nm/s^. Right: Splitting probability test incorporating 
position-dependent diffusion constant, with empirical split- 
ting probability pA shown as a thick dashed line. Because 
the Bayesian scheme of Best and Hummer [15] was used to 
compute potentials of mean force and diffusion constants, the 
estimated diffusion constant D{x) and PMF-derived splitting 
probabilities pa are shown as thin solid lines representing 20 
samples from the Bayesian posterior. 



the observed splitting probability and the PMF-derived 
splitting probability that indicate this reaction coordi- 
nate choice is not ideal. We note that the presence of ~ 
1 kb dsDNA handles tethering the DNA hairpin to the 
laser-trapped polystyrene beads is one potential source 
of the incomplete alignment of the extension coordi- 
nate w^ith the reaction coordinate for hairpin unzipping. 
Shorter dsDNA handles have recently been suggested as 
a way to improve the signal-to-noise ratio [25], and may 
also improve the reaction coordinate quality. For pro- 
teins, techniques that allows the attachment of tethers at 
specific attachment points can be exploited to probe for 
improved reaction coordinate should the experimenter 
find that the current pulling coordinate under study is 
unsuitably poor [26]. Finally, we note that though this 



test is able to test the suitability of the extension coor- 
dinate for a polymer under force, we cannot determine 
from the present analysis whether a good reaction coor- 
dinate in the presence of external force would also be a 
good reaction coordinate in the absence of force, or even 
under different biasing forces; this concern is still the 
subject of active study [27, 28]. 
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